Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets

نویسندگان

  • Sunghwan Kim
  • Evan Bolton
  • Stephen H. Bryant
چکیده

BACKGROUND PubChem is a public repository for biological activities of small molecules. For the efficient use of its vast amount of chemical information, PubChem performs 2-dimensional (2-D) and 3-dimensional (3-D) neighborings, which precompute "neighbor" relationships between molecules in the PubChem Compound database, using the PubChem subgraph fingerprints-based 2-D similarity and the Gaussian-shape overlay-based 3-D similarity, respectively. These neighborings allow PubChem to provide the user with immediate access to the list of 2-D and 3-D neighbors (also called "Similar Compounds" and "Similar Conformers", respectively) for each compound in PubChem. However, because 3-D neighboring is much more time-consuming than 2-D neighboring, how different the results of the two neighboring schemes are is an important question, considering limited computational resources. RESULTS The present study analyzed the complementarity between the PubChem 2-D and 3-D neighbors. When all compounds in PubChem were considered, the overlap between 2-D and 3-D neighbors was only 2% of the total neighbors. For the data sets containing compounds with annotated information, the overlap increased as the data sets became smaller. However, it did not exceed 31% and substantial fractions of neighbors were still recognized by either PubChem 2-D or 3-D similarity, but not by both. The Neighbor Preference Index (NPI) of a molecule for a given data set was introduced, which quantified whether a molecule had more 2-D or 3-D neighbors in the data set. The NPI histogram for all PubChem compounds had a bimodal shape with two maxima at NPI = ±1 and a minimum at NPI = 0. However, the NPI histograms for the subsets containing compounds with annotated information had a greater fraction of compounds with a strong preference for one neighboring method to the other (at NPI = ±1) as well as compounds with a neutral preference (at NPI = 0). CONCLUSION The results of our study indicate that, for the majority of the compounds in PubChem, their structural similarity to other compounds can be recognized predominantly by either 2-D or 3-D neighborings, but not by both, showing a strong complementarity between 2-D and 3-D neighboring results. Therefore, despite its heavy requirements for computational resources, 3-D neighboring provides an alternative way in which the user can instantly access structurally similar molecules that cannot be detected if only 2-D neighboring is used.Graphical AbstractThe binned distribution of the neighbor preference indices (NPIs) for all compounds in PubChem (left) has a bimodal shape with two maxima at NPI = ±1 and a minimum at NPI = 0, indicating that structural similarity between compounds in PubChem can be recognized predominantly by either 2-D or 3-D neighborings, but not by both. The NPI histogram for the drug space (right) has a greater fraction of compounds with a strong preference for one neighboring method to the other (at NPI ≈ ±1) as well as compounds with a neutral preference (at NPI ≈ 0), indicating that the drug space is very different from the PubChem space.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PubChem3D: Similar conformers

BACKGROUND PubChem is a free and open public resource for the biological activities of small molecules. With many tens of millions of both chemical structures and biological test results, PubChem is a sizeable system with an uneven degree of available information. Some chemical structures in PubChem include a great deal of biological annotation, while others have little to none. To help users, ...

متن کامل

NCBI News, April 2009

NCBI’s PubChem now features calculated three-dimensional conformers (3-D conformers) for a large proportion of the PubChem compound database (17 million records, 88%). In addition, conformers are clustered to provide a list of similar 3-D conformers. These similar conformers provide a more relevant and expanded set of compounds with potentially similar biological and pharmacological activity. P...

متن کامل

PubChem3D: a new resource for scientists

BACKGROUND PubChem is an open repository for small molecules and their experimental biological activity. PubChem integrates and provides search, retrieval, visualization, analysis, and programmatic access tools in an effort to maximize the utility of contributed information. There are many diverse chemical structures with similar biological efficacies against targets available in PubChem that a...

متن کامل

Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis

UNLABELLED BACKGROUND To improve the utility of PubChem, a public repository containing biological activities of small molecules, the PubChem3D project adds computationally-derived three-dimensional (3-D) descriptions to the small-molecule records contained in the PubChem Compound database and provides various search and analysis tools that exploit 3-D molecular similarity. Therefore, the ef...

متن کامل

PubChem3D: Shape compatibility filtering using molecular shape quadrupoles

BACKGROUND PubChem provides a 3-D neighboring relationship, which involves finding the maximal shape overlap between two static compound 3-D conformations, a computationally intensive step. It is highly desirable to avoid this overlap computation, especially if it can be determined with certainty that a conformer pair cannot meet the criteria to be a 3-D neighbor. As such, PubChem employs a ser...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2016